September 17, 2018

Roadmap

  • How to use R Markdown? Tutorial with example document.
  • How to use Git and Github for version control?

Administrative

New Students - Welcome!

Waitlist

  • A few students were added from the waitlist.
  • Waitlist is now closed and further is no longer active for Fall 2018 term.
  • Unlikely to be added after this week.

Github usernames

Office Hours set

Advanced Material

  • To allow advanced students or students with interest in specific topics to go a bit further than what is covered in the course, I am including a section called Advanced Topics (optional, on your own only) on the syllabus for each week.
  • Feel free to completely ignore these sections!
  • These sections are meant to be autodidactive. The TAs and myself are first and foremost concerned with the material covered in the course.

Types of course registration

  • For credit, graded: Required to hand in all exercises and complete the final project/exam.

  • Pass-Fail: Same process as graded students. I submit a letter grade which the registrar converts to either P or F.

  • R registered students (audit):
    • Complete 2 of the assignments from week 3 or later.
    • No need to complete final project.

R Studio and R Markdown

Using R in RStudio

Typical new R user

Using R in RStudio

Literate R programming with R Markdown

Using R in RStudio

R Notebooks

R Markdown

R Markdown

  • I asked everyone to familiarize themselves with R Markdown.
  • Success?

R Markdown Cheat Sheet

  • Use the R Markdown cheat sheet for a nice overview on Markdown in RStudio.

Reporting in R Markdown

  • Try yourself:
  • File -> New File -> R Markdown -> Document -> HTML
  • Set output to different file formats (with the Knit button)

What is Markdown?

What is Markdown?

Markdown Syntax

Simple Formatting and Links

Headers

To create titles and headers, use leading hastags. The number of hashtags determines the header's level:

# First level header
## Second level header
### Third level header

Lists

To make a bulleted list in Markdown, place each item on a new line after an asterisk and a space, like this:

* item 1
* item 2
* item 3

You can make an ordered list by placing each item on a new line after a number followed by a period followed by a space.

1. item 1
2. item 2
3. item 3

Embedding equations

You can also use the Markdown syntax to embed latex math equations into your reports. To embed an equation in its own centered equation block, surround the equation with two pairs of dollar signs like this,

$$1 + 1 = 2$$

To embed an equation inline, surround it with a single pair of dollar signs, like this: $1 + 1 = 2$

All standard Latex symbols work.

Knitr

knitr is an engine for dynamic report generation with R and is used to convert (or "knit") R Markdown files into the desired output format.

Including R code inline and in chunks

  • R code can be included as chunk with

    ```{r} ```

or inline with a single tickmark.

  • R functions sometimes return messages, warnings, and even error messages. By default, R Markdown will include these messages in your report. You can use the message, warning and error options to prevent R Markdown from displaying these.

Popular chunk options

Pandoc

  • The Pandoc program (by John MacFarlane) renders R Markdown documents into the output we want.
  • We can include specific options into the YAML header on top of our document to control this process.

R Notebooks

Switch to the demo file Rmarkdown_demo.Rmd in the exercise folder for this week.

R Notebooks

  1. Interact with R in a single, seamless stream.
  2. Iterate quickly on code and output; see code and output together.
  3. Leave a clean, reproducible record of your analysis in a simple text file.
  4. Document your analysis with rich, literate prose.
  5. Share and publish easily.
  6. One-click export to PDF, Word, etc.






Source: R Studio Webinar "Introducing Notebooks with R Markdown"

Other Output Formats

  • html_document
  • pdf_document
  • word_document
  • beamer_presentation / slidy_presentation / ioslides_presentation
  • github_document

Note: see also description in exercise document.

Version Control with Git and Github

Why version control?

Why version control?

  • Confusing
  • Prone to errors
  • Collaboration
  • Describe the changes
  • Feature development

Why Git?

  • Git
    • master-branch workflow
    • distributed (rather than centralized) version control
    • pull requests to manage/discuss updates
    • de facto standard on version control
  • GitHub
    • Github is like facebook for programmers. Everyone’s on there.
    • open source
    • lowers the barriers to collaboration

Resources to get started with Git and GitHub

Intro Git - version control

  • Version control
    • VC is a great way to keep track of changes in code, manuscripts, presentations, and data analysis projects.
    • Allows you to save and annotate all changes to your code and files.
    • No need to rename files as "analysis_v1.R", "analysis_with second graph.R", "analysis_Mike update.R"

Intro Git - Local vs. Shared Repository

Intro Git - Master-Branch

Intro Git - Centralized vs. Distributed Version Control

Intro Git - Popularity

Intro GitHub

  • single largest host for Git repositories, and is the central point of collaboration for millions of developers and projects
  • Free. Allows easy open source hosting. Private repositories available with your .edu email address.
  • While Git is a command line tool, GitHub provides a web-based graphical interface.
  • Provides access control and several collaboration features, such as a wikis and basic task management tools for every project.

Why / For what should you use GitHub?

  • Use GitHub for your homework exercises
  • Get the class material and resources on GitHub.
  • Find useful repositories, packages, data, code, and tutorials: e.g. tidyverse, Hadley Wickham's "Advanced R" book etc.

Tutorial - How to use Github

Creating a new repository

Cloning a repository

  • A clone is a copy of a repository that lives on your computer instead of on a website's server somewhere, or the act of making that copy

Forking a repository

  • A fork is a copy of another user's repository that you manage (and lives on your account).

  • Forks let you make changes to a project without affecting the original repository.

  • You can fetch updates from or submit changes to the original repository with pull requests.

Forking the course repository

Commit to Master

  • A commit, or "revision", is an individual change to a file (or set of files)
  • It's like when you save a file, except every time you save it creates a unique ID. A commit also contains a description of what has changed.

Upload to Remote (Push)

  • Pushing refers to sending your committed changes to a remote repository such as GitHub.com

Changing files online

  • You can change files directly through the github.com website. Click the edit button, change, and the commit.

Changing files online

Github - Branching (optional for this course)

Github Flow - Create a branch

  • Try out ideas for your project by creating a branch.
  • Changes you make on a branch don't affect the master branch, so you're free to experiment and commit changes.

Github Flow - Add commits

  • Whenever you add, edit, or delete a file, you're making a commit, and adding them to your branch.
  • Keeps track of your progress as you work on a branch.
  • Transparent history of your work with associated commit message (i.e a description of your change).
  • Allows you to roll back if things go awry.

Github Flow - Pull Request

  • Pull Requests initiate discussion about your commits.
  • Anyone can see exactly what changes would be merged if they accept your request.
  • Using GitHub's @mention system in your Pull Request message to ask for feedback from your team.

Github Flow - Discuss and Review

  • Discuss and review the changes.
  • Continue to fix code and push up the change with new commits.
  • GitHub will show your new commits and any additional feedback you may receive in the unified Pull Request view.

Github Flow - Deploy

  • Check your branch to verify it works.
  • If your branch causes issues, you can roll it back by deploying the existing master again.

Github Flow - Merge

  • Merge your code into the master branch.
  • Pull Requests preserve a record of the historical changes to your code. Because they're searchable, they let anyone go back in time to understand why and how a decision was made.